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Abstract 

A new ensemble filter that allows for the uncertainty in the prior distribution is pro¬ 
posed and tested. The filter relies on the conditional Gaussian distribution of the state 
given the model-error and predictability-error covariance matrices. The latter are treated 
as random matrices and updated in a hierarchical Bayes scheme along with the state. The 
(hyper)prior distribution of the covariance matrices is assumed to be inverse Wishart. The 
new Hierarchical Bayes Ensemble Filter (HBEF) assimilates ensemble members as general¬ 
ized observations and allows ordinary observations to influence the covariances. The actual 
probability distribution of the ensemble members is allowed to be different from the true 
one. An approximation that leads to a practicable analysis algorithm is proposed. The 
new filter is studied in numerical experiments with a doubly stochastic one-variable model 
of “truth”. The model permits the assessment of the variance of the truth and the true 
filtering error variance at each time instance. The HBEF is shown to outperform the EnKF 
and the HEnKF by Myrseth and Omre (2010) in a wide range of filtering regimes in terms 
of performance of its primary and secondary filters. 


1 Introduction 


Stochastic filtering and smoothing is a mathematical name for what is called in natural sci¬ 
ences data assimilation. Whenever we have three things: (1) an evolving system whose state 
is of interest to us, (2) an imperfect mathematical model of the system, and (3) incomplete 
and noise-contaminated observations, there is room for data assimilation. Currently, data as¬ 
similation techniques are extensively used in geophysics: meteorology, atmospheric chemistry. 


oceanography, land hydrology (e.g. Lahoz et al. 


2010), underground oil reservoir modeling Oliver] 


et al. (2011), biogeochemistry Trudinger et al. (2008), geomagnetism Fournier et al. (2010), and 


being explored in other areas like systems biology Yoshida et al. (2008), epidemiology Rhodes 


and Hollingsworth (2009), ecology Niu et al. (2014), and biophysics Chapelle et al. (2013). Data 


assimilation techniques have reached their most advanced level in meteorology. 

To simplify the presentation of our technique, we confine ourselves to sequential discrete¬ 
time filtering, whose goal is to estimate the current state of the system given all present and 
past observations. This is a cycled procedure, each cycle consists of an observation update step 
(called in meteorology analysis) when current observations are assimilated, and a time update 
(forecast) step that propagates information on past observations forward in time. 
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1.1 Stochastic models of uncertainty 


Virtually all advanced data assimilation methods rely on stochastic modeling of the underlying 
uncertainties in observations and in the forecast model. Historically, the first breakthrough in 
meteorological data assimilation was the introduction of the stochastic model of locally homoge¬ 
neous and isotropic random fields and the least squares estimation approach based on correlation 
functions (optimal interpolation by Eliassen Eliassen (1954) and Gandin Gandin (1965)). The 


second big advancement was the development of global multivariate forecast error covariance 
models no longer based on correlation functions but relying on more elaborate approaches like 
spectral and wavelet models, spatial filters, diffusion equations, etc. Rabier et al. (1998); Eisher] 
(2003); [Peckmyn and Berre (2005); Purser and Wu (2003); Weaver and Courtier (2001); these 
(estimated “off-line”) forecast-error models have been utilized in so-called variational data as¬ 


similation schemes (e.g. Rabier et al., 1998). The third major invention so far was the Ensemble 


Kalman Eilter (EnKE) by Evensen Evensen (1994), in which the uncertainty of the system 
state is assumed to be Gaussian and represented by a Monte Carlo sample (ensemble), so that 
static forecast error covariance models are replaced by dynamic and flow-dependent ensemble 
covariances. The EnKE has then developed into a wide variety of ensemble based techniques 
including ensemble-variational hybrids, e.g. Houtekamer and Mitchell (2001); Buehner et al.| 


(2013); Lorenc et al. (2014). 


There is another class of non-parametric Monte Carlo based filters called particle filters (e.g. 


van Leeuwen, 2009). They do not rely on the Gaussian assumption and thus are better suited 


to tackle highly nonlinear problems, but the basic underlying idea of representing the unknown 
continuous probability density by a sum of a relatively small number of delta functions looks 
attractive for low-dimensional systems, whereas in high dimensions, its applicability remains to 
be convincingly shown. We do not consider particle filters in this paper. 

In this research, we propose to retain a kind of Gaussianity because a parametric prior 
distribution has the advantage of bringing a lot of regularizing information in the vast areas 
of state space, where there are no nearby ensemble members. But we are going to relax the 
Gaussian assumption replacing it by a more general conditionally Gaussian model. 


1.2 Uncertainty in the forecast error distribution 


In the traditional EnKE, the forecast (background) uncertainty is characterized by the forecast 
error covariance matrix B, which is estimated from the forecast ensemble. The problem is that 
this estimate cannot be precise, especially in high-dimensional applications of the EnKE, where 
the affordable ensemble size is much less than the dimensionality of state space. So, the forecast 


uncertainty in the EnKE is largely uncertain by itself (Eurrer and Bengtsson, 2007; Sacher and 


Bartello, 2008). On the practical side, a common remedy here is a kind of regularization of the 


sample covariance matrix (e.g. Eurrer and Bengtsson, 2007). But these techniques (of which 


the most widely used is covariance localization or tapering) are more or less ad hoc and have 
side effects, so a unifying paradigm to optimize the use of ensemble data in filtering is needed. 
On the theoretical side, there is an appropriate way to account for this uncertain uncertainty: 


hierarchical Bayes modeling (e.g. Robert (2007)) 


1.3 Hierarchical Bayes estimation 

In the classical non-Bayesian statistical paradigm, the state x [parameter in statistics) is con¬ 
sidered to be non-random being subject of estimation from random forecast and random obser¬ 
vations. Optimal interpolation is an example. 

In the non-hierarchical Bayesian paradigm, both observations y and the state x are regarded 
as random. At the first level of the hierarchy, one specifies the observation likelihood p(y |x). As x 
is random, one introduces the second level of the hierarchy, the probability distribution of x that 
summarizes our knowledge of the state x before current observations y are taken into account, 
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the prior distribution p(x|i9). Here i? is the non-random vector of parameters of the prior 
distribution (called hyperparameters). So, the non-hierarchical Bayesian modeling paradigm is, 
essentially, a two-level hierarchy (y|x and x|i9). In the analysis, the prior density p(x|i9) is 
updated using the observation likelihood p(y|x) leading to the posterior density p(x|y). Note 
that the analysis step in the Kalman Filter can be viewed as an example of the two-level Bayesian 
hierarchy, in which the prior (x|b, B) is the Gaussian distribution with the hyperparameter b 
being the predicted ensemble mean vector and the hyperparameter B the predicted ensemble 
covariance matrix. Variational assimilation can be regarded as a similar two-level Bayesian 
hierarchy with b being the deterministic forecast and B the pre-specified covariance matrix. 

In the hierarchical Bayesian paradigm, not only observations and the state are random, the 
prior distribution is also assumed to be random (uncertain). Specifically, the hyperparameters 
'd are assumed to be random variables having their own (hyper)prior distribution governed by 
hyperhyperparameters 7 . If 7 are non-random, then we have a three-level hierarchy (y|x, x|i9, 
and i?|7). The meaningful number of levels in the hierarchy depends on the observability of 
the higher-level hyperparameters: a hyperparameter is worth to be considered as random and 
subject of update if it is “reasonably” observed. We will rely in this study on a three-level 
hierarchy with the prior covariances as the random hyperparameter. 

Historically, Le and Zidek Le and Zidek (2006) introduced uncertain covariance matrices in 


the static geostatistical non-ensemble estimation framework known as Kriging. Berliner Berliner 


(1996) proposed to use the hierarchical Bayesian paradigm to account for uncertainties in pa¬ 
rameters of error statistics used in data assimilation. Within the EnKF paradigm, Myrseth and 
Omre Myrseth and Omre (2010) added b and B to the traditional control vector assuming that 

x|b, B) 

Bocquet Bocquet (2011) took a different path and treated b and 


B is the inverse Wishart distributed random matrix and the distributions b|B and 
are multivariate Gaussian. 


B as nuisance variables to be integrated out rather than updating them as components of the 
control vector. His filter (developed further in Bocquet and Sakov (2012); [Bocquet et al. (2015)) 
imposed prior distributions for random b and B in order to change the Gaussian prior of the 
state X to a more realistic continuous mixture of Gaussians. 

In this study, we follow the general path of Myrseth and Omre (2010). We propose to split B 
into the model error covariance matrix Q and the predictability error covariance matrix P. The 
reason for such splitting is the fundamentally different nature of model errors (which are external 
to the filter) vs. predictability errors (which are internal, i.e. determined by the filter). At the 
analysis step, following the hierarchical Bayes paradigm, we update P and Q along with the 
state X using both observation and ensemble data. Performance of the new filter is thoroughly 
tested in numerical experiments with a one-variable model. Note that the observation error 
covariance matrix is assumed to be precisely known in this study. 


2 Background and notation 

We start by outlining filtering techniques that have led to our approach, indicating those of their 
aspects that are relevant for this paper. Thereby, we introduce the notation; the whole list of 
main symbols can be found in[Dt 

2.1 Bayesian filtering 

The general Bayesian filtering paradigm assumes that unknown systems states x^ G M"" (where 
k = 0,1,... denotes the time instance and n the dimension of the state space) are random, 
subject to estimation from random observations yi,*, = (yi,...,yfc). The true system states 
obey a Markov stochastic evolutionary model such that the transition density p(xfc|xfc_i) is 
available. Observations are related to the truth through the observation likelihood p{yk\^k)- 
The optimal filtering process consists in alternating forecast and analysis steps. At the forecast 


3 






















step the predictive density p^x^lyi-.k-i) is computed. The goal of the analysis step is to compute 
the filtering density p{xk\yi:k)- 

At the analysis step, the predictive density is regarded as a prior density, which we denote by 
the superscript / (from “forecast”): p-^ (x^) = p(xfc|yi.fc_i). The filtering density can similarly 
be viewed as the posterior density denoted by the superscript a (from “analysis”): p°'{xk) = 

p(xfc|yi;fc). 

Direct computations of the predictive and filtering densities are feasible only for very low¬ 
dimensional problems. This difficulty can be alleviated if we turn to linear systems. 

2.2 Linear observed system 

The evolution of the truth is governed by the discrete-time linear stochastic dynamic system: 

Xk = FkXk-i +Sk, ( 1 ) 

where the (linear) forecast operator, £k ~ AA(0, Qk) the model error, and Qk the model error 
covariance matrix. Observations y^ are related to the state through the observation equation 

yk = HfcXfc -F r]k, (2) 

where is the (linear) observation operator, rjk ~ AA(0, R^) the observation error, and the 
observation error covariance matrix. 


2.3 Prior and posterior covariance matrices 

Here we introduce the prior, posterior, and predictability covariance matrices, which will be 
extensively used throughout the paper. By = Exfc|yi.fc_i, we denote the mean of the prior 
distribution and by 

Bfc = E [(xfc - bfc)(xfc - bfc)’^|yi:fc_i] (3) 

the prior covariance matrix. Similarly, a^ = Ex^lyi-^ is the posterior mean and 

Afc = E [(xfc - afc)(xfc - afc)’^|yi:fc] (4) 

the posterior covariance matrix. With the linear dynamics defined in Eq. ^,hk and Bfc satisfy 
the equations 

bfc = E [FfcXfc_i -E ek\yv.k-i] = Ffc a^.i (5) 

and 

Bfc = E [(Ffc(xfc_i - afc_i) -E Sk) ■ (Ffc(xfc_i - a^-i) -E efc)"^|yi:fc-i] = Pfc + Qk, (6) 

where 

Pk = FkAk-iFj ( 7 ) 

is the predictability (error) covariance matrix. 


2.4 Kalman filter 


For the linear system introduced in section 
Kalman filter (KF). Its forecast step is 


2.2 


the mean-square optimal linear filter is the 


4 = 


( 8 ) 


where, we recall, the superscripts / and a stand for the forecast and analysis filter estimates, 
respectively. The analysis update is 

Xfc = x{ -E Kfc(yfc - Hfcx{), (9) 
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( 10 ) 


where is the so-called gain matrix: 
The posterior covariance matrix is 


Afc = (I-KfcHfc)Bfc. (11) 

Note that Eqs.([^ and ([^ constitute the so-called primary filter Dee et ah] ( 1985[ ), in which the 
estimates of the state are updated. The primary filter uses the forecast error covariance matrix 
Bfc computed in the secondary filter, which is comprised of Eqs. I®,©, (§ , and (0. 


2.4.1 Remarks 

1. The KF’s forecast and analysis are exactly the prior mean and the posterior 
mean a^, respectively. Therefore the above prior and posterior covariance matrices B^ 
and Afc have also the meaning of the error covariance matrices of the filter’s forecast and 
analysis, respectively. 

2. The KF’s secondary filter uses only observation operators and not observations themselves. 
As a consequence, the conditional covariance matrices B^, A^, and coincide with their 
unconditional counterparts, B^, A^, and (this fact will be utilized below in section 
4 ^ . 

3. The KF produces forecast and analysis estimates x^ and x^ that are the best in the mean- 
square sense among all linear estimates. The KF estimates become optimal among all 
estimates if the involved error distributions are Gaussian. For highly non-Gaussian distri¬ 
butions, the KF can be significantly sub-optimal, so the (near) Gaussianity is implicitly 
assumed in the KF (this holds for the ensemble KF as well). 

The KF is still prohibitively expensive in high dimensions. This motivated the introduction 
and wide spread in geophysical and other applications of its Monte Garlo based approximation, 
the ensemble KF. 


2.5 Ensemble Kalman filter (EnKF) 


As compared with the KF, the EnKF replaces the most computer-time demanding step of 
forecasting P^ (via Eq.Q) by its estimation from a (small) forecast ensemble. Members of this 
ensemble, x^^(i) (where /e denotes the forecast ensemble, i = 1,..., N, and N is the ensemble 
size) are generated by replacing the two uncertain quantities in Eq. 0 > Xfc_i and Sfc, by their 
simulated counterparts, x^*ij^(i) and e^(i), respectively: 

x{'{i)=Fixr_i(i)+4(i). (12) 


Here the superscript ae stands for the analysis ensemble (see below in this subsection) and the 
superscript e for a simulated pseudo-random variable. Then, the sample {i)}fLi is used 
to compute the sample (ensemble) mean and the sample covariance matrix Sfc. The Kalman 
gain Kfc is computed following Eq.(lO), in which B^ is a somehow regularized (normally. 


by applying variance inflation and spatial covariance localization, (e.g. Furrer and Bengtsson 


2007)). 


The analysis ensemble = {x^®(z)} is computed either deterministically by transforming 
the forecast ensemble (e.g. Tippett et al.[ 2003), or stochastically (e.g. Houtekamer and Mitchell 


2001). In this study, we make use of the stochastic analysis ensemble generation technique, in 


which the observations are perturbed by adding their simulated observation errors r]^{i) ~ 
AA(0,R) and then assimilated using xl^(i) as the background: 

+ Kfc(yfc + T]%i) - Hx-''^(i)). (13) 


Note that in practical applications, the forecast operator is allowed to be nonlinear. 
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2.6 Methodological problems in the EnKF that can be alleviated nsing the 
hierarchical Bayes approach 

1. In most EnKF applications, the prior covariance matrix is largely uncertain due to the 
insufficient ensemble size, which is not optimally accounted for. As a result, the filter’s 
performance degrades. 

2. In the EnKF analysis equations, there is no intrinsic feedback from observations to the 
forecast error covariances. The secondary filter is completely divorced from the primary 
one. This underuses the observational information (because observation-minus-forecast 
differences do contain information on forecast-error covariances) and requires external 
adaptation or manual tuning of the filter. 

2.7 Hierarchical filters 

By hierarchical filters, we mean those that aim at explicitly accounting for the uncertainties in 
the filter’s error distributions using hierarchical Bayesian modeling. 


2.7.1 Hierarchical Ensemble Kalman filter (HEnKF) by Myrseth and Omre 


Myrseth and Omre (2010) 


Myrseth and Omre Myrseth and Omre (2010) were the first who used the Hierarchical Bayes 


approach to address the uncertainty in the forecast error covariance matrix within the EnKF. 
Here we outline their technique using our notation. To simplify the comparison of their filter 
with ours, we assume that the dynamics are linear and neglect the uncertainty in the prior mean 
vector bfc identifying it with the deterministic forecast x^. The HEnKF differs from the EnKF 
in the following respects. 


(i) Bfc is assumed to be a random matrix with the inverse Wishart prior distribution: Bfc ~ 

XW(0,B^), where 6 is the scalar sharpness parameter and B^ the prior mean covariance 
matrix (see our [a| . B^ is postulated to be equal to the previous-cycle posterior mean 
covariance matrix. 

(ii) The forecast ensemble members are assumed to be drawn from the Gaussian distribution 

AA(bfc,Bfc), where B^ is the true forecast error covariance matrix. 

(iii) Having the inverse Wishart prior for B^ and independent Gaussian ensemble members 
drawn from A/'(bfc, B^.) implies that these ensemble members can be used to refine the prior 
distribution of B^. The respective posterior distribution of B^ is again inverse Wishart 
with the mean B^ equal to a linear combination of B(, and the ensemble covariance matrix 
Sfc (see our i). 

(iv) In generating the analysis ensemble members x^®(z), the HEnKF perturbs not only obser¬ 
vations (as in the EnKF) but also simultaneously the B^ matrix according to its posterior 
distribution. 


The HEnKF was shown to outperform the EnKF in numerical experiments with simple low-order 
models for small ensemble sizes, as well as with an intermediate complexity model without model 
errors for a constant field Myrseth and Omre (2010|. 


2.7.2 EnKF-N “without intrinsic need for inflation” by Bocqnet et al. Bocque^ 
( ]2011 ); Bocqnet and Sakov (2012); Bocqnet et al. (2015) 


In the EnKF-N, the prior mean and covariance matrices are assumed to be uncertain nuisance 
parameters with non-informative Jeffreys prior probability distributions. There is also a variant 
of the EnKF-N with an informative Normal-Inverse-Wishart prior for (b, B). With the Gaussian 
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conditional distribution of the truth (x|b, B) and the perfect ensemble, the unconditional distri¬ 
bution of the truth given the forecast and the ensemble is analytically tractable and is proposed 
to replace, in the EnKF-N, the traditional Gaussian prior. The resulting analysis algorithm 
involves a non-quadratic minimization problem, which, as the authors argue, can be feasible in 
high-dimensional problems. 

In numerical experiments with low-order models, the EnKF-N without a superimposed in¬ 
flation was shown to be competitive with the EnKF with optimally tuned inflation. There were 
also indications that the EnKF-N can reduce the need in covariance localization. 


2.7.3 Need for further research 


Returning to the list of the EnKF’s problems (section |2.6[ ), we note that the HEnKF does address 
the first problem (the uncertainty in B^), but it does not address the second one (absence of 
feedback from observations to covariances in the EnKF). Next, assumption (ii) in section [2. 7.1 


IS 


too optimistic, which will be discussed below in section [3i5| when we introduce our filter. Finally, 
the HEnKF is going to be very costly in high dimensions because of the need to sample from 


an inverse Wishart distribution. (Myrseth and Omre Myrseth and Omre (2010) note, though, 
that this computationally heavy sampling can be dropped, but, to the authors’ knowledge, this 
opportunity has not yet been tested.) 


The EnKF-N addresses both problems mentioned in section 2.6 but it relies on the assump¬ 


tion that forecast ensemble members are drawn from the same distribution as the truth (like the 
HEnKF relies on its assumption (ii)). As we will argue in section 3.5 this cannot be guaran¬ 


teed if background error covariances are uncertain. Besides, the EnKF-N has no memory in the 
covariances (as it does not explicitly update them). As we show below, updating and cycling 
the covariances can be useful. 

Thus, both the HEnKF and the EnKF-N are important first contributions to the area of 
hierarchical filtering, but there is a lot of room in this area for further improvements and new 
approaches. This study presents one of them. 


3 Hierarchical Bayes Ensemble (Kalman) Filter (HBEF) 

3.1 Setup and idea 

We formulate the HBEF for linear dynamics and linear observations, see Eqs. 0 and 0. Ob¬ 
servation errors are Gaussian. Other settings come, mainly, from the formulation of conditions 
under which the EnKF actually works in geophysical applications: 

1. The ensemble size is too small for sample covariance matrices to be accurate estimators. 

2. The direct computation of the predictability covariance matrix P^, as F/cAfc_iFj is un¬ 
feasible. 

3. The model error covariance matrix is temporally variable and explicitly unknown. 

We also hypothesize that 

4. Gonditionally on Q^, the model errors are zero-mean Gaussian: ek\Q,k ~ AA(0, Q^). 

5. We can draw independent pseudo-random samples from Af{0, Q^) with the true Q^. 

Under these assumptions, the KF theory cannot be applied. In this research, we propose a theory 
and design a filter (the HBEF) that acknowledge in a more systematic way than this is done in 
the EnKF that the covariance matrices and are substantially uncertain. We regard 
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and Pfc as additional (to the state x^) random matrix variate variables to be estimated along 
with the state. We represent both the prior and the posterior distributions hierarchically: 


p(x, P, Q) = p(P, Q) • p(x|P, Q) 


(14) 


and advance in time the two densities in the r.h.s. of this equation. Thereby the conditional 
density p(x|P, Q) is shown below to remain Gaussian. This point is central to our approach. 
As for the marginal density p(P, Q), its exact evolution appears to be unavailable, so we intro¬ 
duce approximations to the prior, postulating it to be static and based on the inverse Wishart 
distribution at any assimilation cycle. 

Actually, not only and Pfc are uncertain, the prior conditional mean is uncertain 
as well. But to simplify the presentation of our approach, we disregard the uncertainty in 
and assume that b^ = x^, where x^ is the deterministic forecast. This implies that remark 
in section |2.4.1 applies here, therefore we will use the terms “prior” and “forecast error” 
interchangeably (and similarly for “posterior” vs. “analysis error”). 

A notational comment is in order. To avoid confusion of a point estimate (produced by a 
filter) with its true counterpart, we mark the former with a superscript (/ or a) or the tilde. 
E.g. is the analysis point estimate of the true prior variance B^. 


3.2 Observation and ensemble data to be assimilated 

The HBEF aims to optimally assimilate not only conventional observations but also ensemble 
members. To estimate Qfc and Pfc, we split the forecast ensemble (computed on the interval 
between the time instances k — 1 and k) = (x^'^(l),... ,xjf{N)) into two ensembles. The 
first one is the model error ensemble = (x™'^(l),..., x™®(X)), whose members are pseudo¬ 
random draws from the true distribution of the model errors. The second ensemble is the 
predictability ensemble X|^ = (x^®(l),... ,x^^{N)) defined to be the result of the application of 
the forecast operator to the previous-cycle analysis ensemble X^l^. The latter is generated 
by the filter to represent the posterior distribution of the truth (see below). 

Note that this splitting of the forecast ensemble does not imply that the ensemble size is 
doubled. In the course of the traditional forecast ensemble, we suggest preventing model error 
perturbations from being added to the model fields while accumulating them in the model error 
ensemble members. 

We denote the combined (observation and ensemble) data at the time k as = 
(yfc, X™*^, X^*^). To assimilate these data, we need the respective likelihoods. 

3.3 Observation likelihood 

The Gaussianity of observation errors implies that the observation likelihood is, by definition, 

p(yfc|xfc) oc e- 3 (yk-Hxk)TRk'(yk-Hxg^ ^^^ 5 ) 


3.4 Model error ensemble likelihood 


From assumption (section |3.1| ) and it follows that we can write down the likelihood of Qfc 
given the model error ensemble member x^® 


p(xr(i)iQfc)(x 


(i): 

1 




_1 /virie 

e ^ 


(i))^Qk 


= (i) 


(16) 


where |.| stands for the matrix determinant. 


We emphasize that the existence of the likelihood p(x™®(z)|Qfc), Eq.(16), implies that 


mem¬ 


bers of the model error ensemble X^® can be viewed as observations on the true Q^. 
This is because the likelihood provides the necessary relationship between the data we have 







(x™®(z) here) and the parameter we aim to estimate (Qfc), see also[^ For the whole ensemble 
the likelihood becomes 


N 


p(xriQfc) = oc IQ 


.-K -Nj-rfs 

2 P 2 


meq-l) 


where 


i=l 


S me _ 


N 

JvE 

2=1 


xr(i)xr 


(*)■ 


(17) 


(18) 


is the sample covariance matrix. 


Remark . Equation (18) differs from the conventional sample covariance formula: the ensem¬ 


ble members are not centered by the ensemble mean and the sum is divided by N and not by 
N — 1. These differences stem from our neglect of the uncertainty in b^. In practical problems, 
when we are not so sure about the mean, the conventional sample covariance matrix is to be 
preferred. 


3.5 Predictability ensemble likelihood 

Note that both ordinary observations and model error ensemble members x™®(i) are produced 
outside the filter. The likelihoods Eqs. (15) and (16) relate yk and x^®(i) to the variables (x^ and 


Qfc, respectively), which are independent of the filter, too. So, the two likelihoods do influence 
the filter (they are, in fact, parts of its setup) but not vice versa. 

This is in contrast to the predictability ensemble members x|®(i), which are generated by 
the filter itself. For each k, both the distribution of x^^(i) and the true Pfc are determined by 
the filter’s performance. Therefore, we cannot impose a relationship between x|'^(i) and Pfc. We 
can only try to reveal this relationship. 

In so doing, we note that the true prior covariances are unavailable to the filter (assumption 
[T|. Therefore, the analysis gain matrix Kfc is inevitably inexact Furrer and Bengtsson (2007); 


Sacher and Bartello (2008), which causes the analysis ensemble members x“(i) to be distributed 
with a covariance matrix different from the true posterior covariance matrix Afc. As a result, 
the next-cycle predictability ensemble members x|^^(i) cannot be distributed with the true 
predictability covariance matrix Pfc+i. (For the same reason, members of the traditional forecast 
ensemble cannot have the same conditional distribution as the truth in any situation in 
which Bfc is uncertain.) This important point is further illustrated below in sections 4.6 and 


The conclusion that there is no known relationship between x^ (i) and Pfc entails that the 
likelihood p(x^^(i)|Pfc) is not available and so, strictly speaking, the predictability ensemble 
members cannot be used (assimilated) to update the prior distribution and yield the desired 
posterior distribution of Pfc. In order to come up with a mathematically sound way of extracting 
information on the true Pfc contained in the predictability ensemble X^*^, we use the following 
device. 

First, we postulate the existence of an (explicitly unknown) auxiliary matrix variate random 
variable Ilfc such that the predictability ensemble members x|^(i) are Gaussian distributed with 

f 

the known mean (identified with the deterministic forecast xy and the covariance matrix Ilfc: 


N 


p(xf iHfc) = []p(x?’'^(z)|nfc) oc |nfc|-f 


(19) 


2 = 1 


where is the predictability ensemble sample covariance matrix: 


N 


- x{) (xr(o - x{) 


/^T 


( 20 ) 


2 = 1 
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Second, we assume that the true Pfc has a (known) probability distribution related to 11^. 
Specifically, we assume that 

Pfc|nfc~xw(0,nfc), (21) 


where 9 is the sharpness parameter (see 0. which controls the spread of the distribution of P^ 
around its mean Ilfc (the greater 9 the smaller the spread). 

Now, we observe that we have related to 11^ through the density p(X|'^|nfc), see Eq.( 19), 
and Il/i, to Pfc through the density p(Pfc|nfc), see Eq.(21). The resulting indirect relationship 
between X^*^ and P^ will allow us to assimilate the former in order to update the latter. 

Thus, we have the likelihoods for both ordinary observations and ensemble data. Next, we 
need the prior distribution. 


3.6 Analysis: prior distribution 

The analysis control vector comprises x, P, and Q; we also have the auxiliary variable 11 (a 
nuisance parameter). Note that here and elsewhere we drop the time index k whenever all 
variables in a given equation pertain to the same assimilation cycle k. We have to define a prior 
distribution (recall, denoted by the superscript /) for all these four variables combined. By the 
prior distribution, we mean the conditional distribution given all past assimilated data 
This conditioning is implicit throughout the paper in pdfs marked by the superscript /. We 
specify the joint prior hierarchically: 

/(x,n,p,Q) =p^(n,p,Q)p^(x|n,p,Q) =p^(Q)/(n|Q)p/(p|Q,n)p(x|p,Q). ( 22 ) 


The key feature here (assumed at the start of filtering, i.e. at A: = 1, and proved below for k > 1) 
is that the prior distribution of the state is conditionally Gaussian given P, Q: 


x|P,Q ~ AA(x^,B = P + Q). 


(23) 


Now, consider the priors for the covariance matrices in Eq.(22). Starting with pf (Q), we hy¬ 


pothesize that there is a sufficient statistic and this sufficient statistic is produced by the 
secondary filter as an estimate of Q from past data, see section [3. 9. 3[ Then, from sufficiency, the 
dependency on the past data in p-^{Qk) = p(Qfc|Yi:fc_i) can be replaced by the dependency on 
Q-^, so that P'^(Q) = p(Q|Q'^). Similarly, we postulate that p-^(n|Q) = p(n|P-^), where P-^ is 
also provided by the secondary filter, and that p'f(p|Q^n) = p(P|n), where the latter density 
is defined in Eq.(21). As a result, Eq.p2|) writes 


p- 




x,n,p,Q) =p(Q|Q^)p(n|p^)p(p|n)p(x|B = p + Q). 


Eurther, we model p(Q|Q'^) and p(n|P'^) using the inverse Wishart distribution: 

QIQ-^ ~XW(x,Q^) and BlP-^ ~ XW(</>, P^), 


(24) 


(25) 


where x and cj) are the static sharpness parameters. 

To summarize, the prior distribution is given in Eq.(24), where the first three densities in 
the r.h.s. are inverse Wishart and the last one is Gaussian. Prior to the analysis, we have the 
deterministic forecast x-^ and the five parameters of the three (hyper)prior (inverse Wishart) 
distributions: Q^, P^, X, 4 >: arid 9. Now, we have to update the prior distribution using both 
ordinary and ensemble observations and come up with the posterior distribution. 


3.6.1 Remarks 

1. The conditional Gaussianity is a natural extension of the Gaussian assumption made in the 
KE and the EnKE and is crucial to the HBEE as it enables a computationally affordable 
analysis algorithm. 
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2. The choice of the inverse Wishart distribution is motivated by its conjugacy for the Gaus¬ 


sian likelihood Anderson (2003); Gelman et al. (2004). Conjugacy means that the poste¬ 


rior pdf belongs to the same distributional family as the prior. In our case, the inverse 
Wishart prior is not fully conjugate but it greatly simplifies derivations and makes the 
analysis equations partly analytically tractable. 

3.7 Posterior 


Multiplying the prior Eq.(24) by the three likelihoods, Eqs. (dH), (0, and ( [I^ , we obtain the 
posterior for the extended control vector (x, P, Q, 11): 

p“(x, P, Q, n) = /(x, P, Q, n I X-^ y) oc 

p/(x, P, Q, n) • p(y|x) • p(X-'=|Q) • p(X^’'=|n) = 

[p(Q|Q^)p(X”^^|Q)] • [p(n|P^)p(XP^|n)] • [p(P|n)] •p(x|B = P + Q) -Kylx) (26) 

Note that in densities marked by the superscript a, the dependency on the past and present data 


Yi-fc is implicit. Now, our goal is to transform Eq.(26) and reduce it to the required posterior 
p“(x,P,Q). 


We start by simplifying the expressions in the first two brackets in the third line of Eq. (26). 


These are seen to be the two prior densities, p(Q|Q'^) and p(n|P'^), updated by the respective 
ensemble data but not yet by ordinary observations. Eor this reason, we call them sub-posterior 


densities and denote by the tilde. Eor the inverse Wishart priors, Eq.(25), and the likelihoods 


Eqs. 0 and ( [I^ , the sub-posterior distributions are again inverse Wishart (see 1): 

m) = p(Q|Q0p(X-*^|Q) ~ IWix + N, Q), 


(27) 


p(n) =p(n|p^)p(x”^^|n) ~xw(()) + x,p), 


with the mean values 


Q = 


xQ^ + NS^ 

X + N 


and 


+ NSP^ 

(p + N 


(28) 


(29) 


Next, we eliminate the nuisance matrix variate parameter 11 from the posterior. The standard 
procedure in Bayesian statistics is to integrate 11 out. But in our case we cannot do so analyt¬ 


ically, instead we resort to the empirical Bayes approach Carlin and Louis (2000) and replace 
in the posterior, Eq.(26), 11 with its estimate P (the mean of the sub-posterior distribution 


Eq. (28) defined in Eq. (29)). This allows us to get rid of the second bracket in Eq. (26) (because 


the expression there does not depend on the control vector (x, P, Q) and no longer depends on 
n) and replace the third bracket by 

p(P) =p(P|n = P) ~XW(0,P) (30) 

(see Eq.([^). As a result, we arrive at the following equation for the posterior density 

p“(x,P,Q) (xp(P)p(Q) [p(x|B)p(y|x)], (31) 

where B = P -I- Q and all the terms that contain the state x are placed inside the bracket. 


To reduce the joint posterior Eq.(31) to the marginal posterior of P, Q times the conditional 
posterior of x given P, Q (i.e. to represent the posterior hierarchically), we should integrate x 
out of p“(x, P,Q). This can be easily done because both x-dependent terms in the bracket 


are proportional to Gaussian pdfs w.r.t. x, see Eqs.(23) and (15), and so is their product 


To analytically integrate p“(x, P,Q) over x, we complete the square in the exponent of the 


11 





























p(x|B)p(y|x) expression (technical details are omitted) and take into account that the integral 
of a Gaussian pdf equals one, getting 


^(B|y) = / p(x|B)p(y|x) 


dx oc 


|A|i 


IB 


1 (y-Hxf )T (HBHT+R)-i (y-Hxf) 


(32) 


where the matrix A is defined below in Eq.(37). It is worth noting that /(B|y) defined in 
Eq.([32| is, essentially, the observation likelihood of the matrix B defined as p(y|B): indeed, 
p(y|B) = f p(y|x)p(x|B) dx, hence the notation /(B|y). 

Now we obtain the final posterior 


p“(x,P,Q)=p“(P,Q)-p“(x|P,Q). 


(33) 


Here, from Eqs.(31) and (32), 


p“(P, Q) = / p“(x, P, Q) dx oc p(P) p(Q) l(P + Q|y) 


(34) 


is the marginal posterior. Further, from Eqs.(31) and (34), 


p“(x|P,Q) = ocp(x|B)p(y|x) ~ A/'(m“(B), A(B)), 


(35) 


(where, we recall, B = P + Q) is the conditional posterior. In Eq.(35), the proportionality oc is 
w.r.t. X (because p“(x|P, Q) is a probability density of x). 


m“(B) = + AH’^R"^(y - Hx-^) 

is the conditional posterior expectation of x, and 

A = A(B) = (B-^ + H^R-^H)-^ 

is the conditional posterior (analysis error) covariance matrix. 


(36) 


(37) 


3.7.1 Remarks 

1. Preservation of the conditional Gaussianity in the analysis. The posterior conditional 


distribution of the state p“(x|P, Q), Eq.(35), appears to be Gaussian (coinciding with the 
traditional KF posterior given B = P + Q, therefore Eqs.(36) and (|37l) are exactly the KF 


equations). So, the conditional Gaussianity “survives” the analysis step. 

2. The inverse Wishart priors for the covariance matrices significantly simplify the derivation 
of the posterior distribution, but at the expense of not solving the problem of noisy long¬ 
distance covariances. This implies that covariance localization should be applied to the 
ensemble covariances. 


3. The linear combinations of the prior and ensemble covariance matrices in Eq.(|29|) resemble, 

jedoit and| 


on the one hand, the shrinkage estimator of a covariance matrix proposed by 


Wolf (2004) and, on the other hand, the use of static and ensemble covariances in hybrid 


ensemble variational techniques (e.g. Buehner et ah, 2013 Lorenc et ah, 2014). 


4. Equation (32) shows that observations do influence the observation likelihood of B 


(through the innovation vector y — Hx-^), hence they do influence the marginal posterior 


pa (P,Q), see Eq. (|34[). This is the “mechanism” in the HBEF that provides the desired 


and absent in the KF, EnKF, and HEnKF feedback from observations to the forecast error 
covariances. 
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5. In the classical Bayesian filtering theory outlined in section 2.1, the predictive and filtering 
distributions are conditioned on ordinary observations y. In the HBEF, we explicitly 
condition the posterior on both observation and ensemble data Y. The two conditionings 
lead to different results, but this difference is an inevitable consequence of approximations 
due to the use of the ensemble (Monte Carlo) approach. We will not distinguish between 
them in the sequel. 


3.8 Analysis equations 


Having the posterior p“(x,P,Q), see Eqs.(33)“(35), we now need equations to compute quan¬ 
tities needed for the next assimilation cycle. These are, first, point estimates of x, P, Q (which 
we call deterministic analyses) and second, the analysis ensemble X“®. 


3.8.1 Posterior mean X, P, Q 

The deterministic analyses x“, P“, Q“ are defined as approximations to their respective posterior 
mean values. The latter are given, obviously, by the following equations 


P“ = EP = j y^p“(P,Q)PdPdQ, Q“ = EQ = j y^p“(P,Q)QdPdQ, 


(38) 


B“ = P“ + Q“, 


(39) 


x“ = Ex = EE(x|P,Q) = Em“(P + Q) = j y p“(P, Q) m“(P + Q) dPdQ, 


(40) 


where m“(B) is given by Eq.(36), p“(P,Q) by Eq.(34), the expectation is over the posterior 


distribution, and the integration w.r.t. a matrix is explained in[C| 

The integrals in Eqs.(38) and (40) are not analytically tractable, so we introduce approxi¬ 


mations. We present here two versions of the analysis equations: a Monte Carlo based and an 
empirical Bayes based (the simplest version). 


3.8.2 Monte Carlo based deterministic analysis 


Here, we approximate the integrals in Eqs.(38) and (40) using Monte Carlo simulation. More 


specifically, we employ the importance sampling technique (e.g. Kroese et ah, 2011) with the 
proposal density p(P)p(Q). Generating the Monte Carlo draws P®(i) ~ P(P), Q^(i)~p(Q) 

(where i = 1,... ,M and M is the size of the Monte Carlo sample), and computing B®(i) = 
P®(i) -h Q®(i), we obtain the estimates: 


E"i([B'(i)|y| 


Q“ = 




E*=i^[B^(*)|y]-m“[B^(i) 




(41) 


(42) 


Note that in view of Eq.(32), the resulting analysis is nonlinear in both x-f and y. 


Sampling from an inverse Wishart distribution can be expensive in high dimensions, so we 
propose, next, a cheap alternative. 
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3.8.3 The simplest deterministic analysis 


Here, we neglect the Z(B|y) term in Eq.(34) altogether, thus allowing, as in the HEnKF, no 
feedback from observations to the covariances. The reason for this neglect is that the information 
on P and Q that comes, first, from the prior matrices P-^ and Q-^ and second, from the two 
ensembles and X™'®, summarized in the sub-posterior distributions p(P) and p(Q), is much 
richer than information on P and Q that comes from current observations through the /(B|y) 
term. Indeed, P-^ and Q-^ accumulate vast amounts of past (albeit aging) information on P and 
Q. Model error ensemble members constitute, as we have discussed, N direct observations on Q. 
Predictability ensemble members are N observations on 11 (and so indirectly on P). But there 
is only one set of current ordinary observations, that is, all current observations combined give 
rise to only one (very) noise contaminated observation on HBH"'" -|- R (but note that with the 
known R, this is the only observation on the true B). Therefore, we assume that in Eq.(34) p(P) 


and p{Q) are much more peaked w.r.t. (P, Q) than /(P + Q|y), so that the correction made to 
the sub-posterior by the relatively flat /(P -|- Q|y) is rather small, and in the first approximation 
can be disregarded. This simplification results in the marginal posterior 

p“(P,Q)=p(P).p(Q). (43) 

Both p(P) and p(Q) are inverse Wishart pdfs with the mean values P and Q, respectively, so 

P“ = P and Q“ = Q. (44) 


As for the deterministic analysis of the state, the integral in Eq.(40) remains analytically in¬ 
tractable, so we resort to the empirical Bayes estimate 


x“ = m“(B“), 


(45) 


which is just the KF’s analysis with B“ = P“ -|- Q“ as the assumed forecast error covariance 
matrix. 


3.8.4 Analysis ensemble 


Here, the HBEF follows the stochastic EnKF, see Eq.(13), where = x^^ii) + x^^{i). 


3.9 Forecast step 
3.9.1 Primary filter 

From Eq.Q and assumption]^ we have 

x{ = Exfc|yi:fc_i = Ffc • Exfc_i|yi:fc_i = FfcX^_i, (46) 

which is essentially the KF’s Eq.Q. 


3.9.2 Preservation of the conditional Gaussianity 

Let us look at the basic state evolution Eq. 0- In that equation, ek\Q,k is Gaussian and inde¬ 
pendent of Xfc_i. Further, in the posterior at step A: — 1, as it follows from Eqs.(35) and (36), 
Xk-i is conditionally Gaussian given A^-i- Therefore, from x^ = FfcXfc_i -|- we obtain that 
Xfcl Afc_i, Qk is Gaussian. But if we examine the distribution in question Xfc|Pfc, Q^, we observe 
that with the additional technical assumption that F^ is invertible, conditioning on P^ is equiv¬ 
alent to conditioning on Afc_i (in view of Eq.([^). Consequently, Gaussianity of Xk\Ak-i,Qk 
implies Gaussianity of XfcjPfc, Q^. 

Thus, the basic HBEF’s conditional Gaussianity assumption is preserved at the forecast step 
(as well as the analysis step, see remark]^ in section 3.7.1). 
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3.9.3 Secondary filter 

At the forecast step, the secondary filter has to produce P-^ and Q-^ at the next assimilation 
cycle. We postulate persistence as the simplest evolution model for both P and Q, so that 

P{ = P^i and Qi = QU. (47) 


3.9.4 Generation of the forecast ensembles 


The predictability ensemble is generated by simply applying the forecast operator Ffc to 
the analysis ensemble members see section 


3.8.4 


The model error ensemble Xf 


IS 


generated by directly sampling from the model error distribution A/’(0, Q^). 


4 Numerical experiments with a one-variable model 


In this proof-of-concept study, we tested the proposed filtering methodology in numerical exper¬ 
iments with a one-variable model of “truth”, so that we were able to draw justified conclusions 
on fundamental aspects of the HBEF. Note that in the case of the one-dimensional state space 
we follow the default multi-dimensional notation but without the bold face. 

We compared the HBEF with 


1. The reference KF that has access to the “true” model error variances Qk and is allowed 
to directly compute Pu = FkAk-iF^ = 

2. The stochastic EnKF with the optimally tuned variance inflation factor. 

3. The Var, the filter based on the analysis that uses the constant B (the abbreviation Var 
stands for the variational analysis, which normally uses the time-mean B). 

4. The HEnKF, in which, we recall, the prior mean is excluded from the analysis control 
vector in order to make it comparable with the HBEF. 


We evaluated the performance of each filter by two criteria. The first (main) criterion reflects 
the accuracy of the primary filter measured by the root-mean-square error (RMSE) of the filter’s 
deterministic analysis of the state. For any filter except the Monte Carlo based HBEF, the de¬ 
terministic analysis is defined to be the standard KF’s analysis computed with the deterministic 
forecast as the background and the forecast error covariance matrix provided by the respective 
filter. (With the forecast model described below and small ensemble sizes, the deterministic 
forecast appeared to work better than the ensemble mean.) 

The second criterion represents the accuracy of the secondary filter in terms of the RMSE 
of the filter’s estimates of B (for details, see section 4.8 below). Note that by the RMSE we 
understand the root-mean-square difference with the truth (the true B is defined below in section 


4 ^ . 

Besides the formal evaluation of the performance of the new filter, we also examined some 
other important aspects of the technique proposed. First, we verified that the conditional 
distribution of the state given the covariances was indeed Gaussian. Second, we confirmed that 
the forecast ensemble variances were often systematically different from the true error variances. 
Third, we evaluated the role of the feedback from observations to the covariances, which is 
present in the HBEF with the Monte Carlo based analysis and absent in the other filters. 

To conduct the numerical experiments presented in this paper, we developed a software 
package in the R language. The code, which allows one to reproduce all the below experiments, 
and its description are available from https://github.com/rakitko/hbef. 
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4.1 Model of “truth” 


We wish the time series of the truth to resemble the natural variability of geophysical, specifically, 
atmospheric fields like temperature or winds. We would also like to be able to change various 
aspects of the probability distribution of our modeling true time series, so that the model of 
truth be conveniently parametrized, with parameters controlling distinct features of the time 
series distribution. 

4.1.1 Model equations 

We start by postulating the basic discrete-time equation 

Xk = FkXk-i + (Tk^k, (48) 

where Xk is the truth, and Uk are the scalars to be specified, and ~ AA(0,1) is the driv¬ 


ing discrete-time white noise. Given the sequences {Fk} and {(Tfc}, the solution to Eq.(48) is 
a Gaussian distributed non-stationary time series. The forecast operator T). determines the 
time-dependent time scale of Xk or, in other words, controls the degree of stability of the sys¬ 
tem: forecast perturbations are amplified if |T).| > 1 and damped otherwise. Both {T^} and 
{ck} together determine the time-dependent variance Vk of the random process Xk- The noise 
multiplier cr^ is the model error standard deviation: Qj^ = cr^. 

In nature, both the variance and the temporal length scale exhibit significant chaotic day- 
to-day changes. In order to simulate these changes (and thus to introduce intermittent non- 
stationarity in the process Xk), we let Fk and ak be random sequences by themselves, thus 
making our model doubly stochastic Tjpstheim (1986). Specifically, let Fk be governed by the 
equation: 

Fk- F = ^i{Fk-i - F) + crpel, (49) 


where /r G (0,1) is the scalar controlling the temporal length scale of the process T)., ap is 
the scalar controlling, together with fi, the variance of Fk, is the driving A/'(0,1) white 
sequence, and F is the mean level of the F^ process. Equation (49) is the classical first-order 
auto-regression and its solution Fk is a stationary random process. 


Further, let ak (see Eq.(48)) be a log-Gaussian distributed (which prevents a from attaining 


unrealistically close to zero values and makes it positive) stationary time series: 

cjfc = exp(Sfc) with Sfc = -k aT,e^. 


(50) 


Here, x, cts, and have the same meanings as their counterparts in Eq.(49): /r, ap, and , 
respectively. We finally assume that the three random sources in our model, namely, Sk, ef', and 
are mutually independent. Note that the process Xk is conditionally, given {Fk} and {ak}, 
Gaussian, whereas unconditionally, the distribution of Xk is non-Gaussian. 


4.1.2 Comparison with the existing models of “truth” 

The difference of our model from popular simple nonlinear deterministic models, e.g. the three- 


variable Lorenz model 

Lorenz ( 

1963 

) or discrete-time maps used to test data assimilation tech- 

niques (say, logistic or 

Henon maps 

Du and Smith 

(2012)), is that in the deterministic models 


instabilities are curbed by the nonlinearity, whereas in our model, these are limited by the 
time the random process \Fk\ remains above 1. The nonlinear deterministic models are chaotic 
whereas our model is stochastic. 

One advantage of our model of truth is that it allows us to know not only the truth itself but 


also its time-specific variance 14. Indeed, running the model Eq.(48) L times with independent 


realizations of the forcing process Ek (and with the sequences Fk and ak fixed), we can easily 
assess 14 using square averaging of Xk over the L realizations. 
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Another advantage of the proposed model of truth is that it has as many as five independent 
parameters, F, fi, x, ap, and as, which can be independently changed and which control 
different important features of the stochastic dynamical system Eqs.(48)-(50). These features 
include magnitudes and time scales of the solution Xk, the model error variance Qk, and the 
degree of stability of the system. Note that these aspects affect the behavior of not only the 
truth but also the filters we are going to test. 

In addition, the linearity of our model of truth allows the use of the exact KF as an unbeatable 
benchmark, which again would not be possible with nonlinear deterministic models of truth. 

Finally, we remark that the model defined by Fqs.(48)-(50) is, actually, nonlinear regarded 
as a state-space model, i.e. if the model equations are written as a Markov model for the vector 
state variable Sfc)"''. 


4.1.3 Model parameters 

To select the five internal parameters of the system in a physically meaningful way, we related 
them to the five external parameters: the mean time scale fx of the process Xk, the time scales 
Tp and te of the processes Fk and the probability of the “local instability” tt = P (|Ffc| > 1), 
and the variability in the system-noise variance, which we quantify by s.d. the standard 
deviation of We specified the external parameters and then calculated the internal ones; we 
omit the respective elementary formulas. 


4.2 The “default” configuration of the experimental system 
4.2.1 Model 


In order to assign specific values to the five external parameters, we interpreted our system, 
Fqs.(48)-(50), as a very rough model of the Farth atmosphere. Specifically, we arbitrarily pos¬ 
tulated that one time step in our system corresponds to 2 hours of time in the atmosphere. This 
implies that the weather-related characteristic time scale of 1 day in the atmosphere corresponds 
to the mean time scale fx = 12 time steps for our process Xk- This was the default value for 
fx in the experiments described below. Further, for the “structural” time series Fk and 
we specified somewhat longer time scales, Tp = r-p = l.hfx- Next, the default value of vr was 
selected to be equal to 0.05 and s.d. equal to 0.5—these two values gave rise to reasonable 
variability in the system. We also examined effects of deviations of vr and s.d. from their 
default values, as described below. The sensitivity of our results to the other parameters of the 
model appeared to be low. 


4.2.2 Observations 

We generated observations by applying Eq. (§ every time step with Hk = 1 and % ~ AA(0, R) 
(so that the observation error variance Rk = R is constant in time). To select the default value of 
R, we specihed the default ratio B/R. In meteorology, for most observations, this forecast error 
to observation error ratio is about I, but only a fraction of all system’s degrees of freedom is 
observed. In our scalar system, the only degree of freedom is observed, so, to mimic the sparsity 
of meteorological observations, we inflated the observational noise and so reduced the default 
ratio B/R to he equal to 0.1. This appeared to roughly correspond to the default VR = 9. We 
also examined the effect of varying R: from the well observed case with B/Rc^ 10 to the poorly 
observed case with B/R"^ 0.01. 

4.2.3 Ensemble size 

In real-world atmospheric applications N is usually several tens or hundreds whilst the dimen¬ 
sionality of the system n is up to billions. In our system n = 1, so we chose N to vary from 2 
to 10 with the default value of = 5. 
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4.2.4 Version and parameters of the HBEF 


By default, the simplest version of the HBEF was used, see section 3.8.3 To complete the 
specification of the default HBEF, it remained to assign values to the three sharpness parameters 
X, (j), and 9, which was done by manual tuning. The default respective values were x = 5, iji = 30, 
and 9 = 2. 


4.2.5 Other parameters of the experimental setup 

In the EnKF, the tuned variance inflation factor was 1.005. In the HEnKF, the best sharpness 
parameter was found to be 0 = 10. If not stated otherwise, the below statistics were computed 
with the length of the time series (the number of assimilation cycles) equal to 2 • 10^. 


4.3 Estimation of the true prior variances and signal variances 

For an in-depth exploration of the HBEF’s secondary filter, knowledge of the true forecast error 
variance is very welcome, just like exploring the behavior of a primary filter is facilitated 
if one has access to the truth x^- In this section, we show that our experimental methodology 
enables the assessment of the true Bk as accurately as needed. 

We start by noting that each filter produces estimates of its own forecast error (co)variances 
Bk- By construction, the (exact) KF produces forecast error variances that coincide with the 
true Bk- All approximate filters (including those considered in this study) can produce only 


estimates of the B^, e.g. the HBEF produces the posterior estimate see Eq.(39). It is worth 
stressing that B^ produced by the KF cannot be used as a proxy to the true Bk of any other 
filter because the error (co)variances are filter specific. The true Bk for each filter and each k 
can be assessed as follows. 

Recall that Bk is the conditional (given all assimilated data) forecast error variance. Two 
aspects are important for us here, (i) Bk is the forecast error variance] this suggests that it can 
be assessed by averaging squared errors of the deterministic forecast, (x^ — XkY■ (ii) Bk is the 
conditional error variance; this means that Bk depends on all assimilated so far observations, 
so in order to assess the true Bk, one has to perform the averaging of squared errors only 
for those trajectories of the truth and those observation errors that give rise to exactly (or 
even approximately) the same observations Bk is conditioned upon. This is a computationally 
unfeasible task even for a one-variable model. But the assessment of the unconditional forecast 
error covariance matrix Ifk is feasible and parallels the estimation of the true variance 14 outlined 
in section [4.1.21 

Specifically, we performed L independent assimilation runs, in which the sequences of i4 
and cjfc (as well as the sequence of the observation operators) were the same (thus preserving the 
specificity of each time instance), whereas the sequences of £k, rjk, and the random sources in the 
filters related to the generation of the analysis ensembles were simulated in each run randomly 
and independently from the other runs. Then we used the mean squared forecast error as a 
proxy to the true Bk- 

(51) 


Bk = {{xl-Xkf), 


where the angle brackets (.) denote averaging over the L runs 
As noted in remark in section 2.4.1 


In our experiments L = 500. 
the KF’s conditional Bk does not depend on the 
assimilated observations at all and thus coincides with the unconditional Bk- This is true for 
any non-adaptive EnKF, the HEnKF, and the simplest version of the HBEF as well. But for 
the HBEF with the Monte Carlo based analysis, where there is feedback from observations to 
the covariances, this is not exactly the case. However, as we discussed in section [3.8.3 the 
influence of observations on the posterior estimates of Pk and Qk (and thus Bk) is relatively 
weak, so we used Bk a proxy to Bk for the HBEF with the Monte Carlo based analysis as 
well. To simplify the notation, we do not distinguish (for any filter in question) between the 
true conditional variance Bk, the true unconditional variance Bk: Hi® proxy Bk- 
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Thus, for any time k, we had at our disposal the variance of the truth 14 and each filter’s 
true forecast error variance Bj.. 


4.3.1 Remarks 


1. Our approach here is similar to that proposed in Bishop and Satterfield] (2013). The 
difference is that in Bishop and Satterfield (2013) the truth is deterministic (so that 14 
cannot be assessed) and the forecast model is stochastic, whereas our model assumes that 
the truth is stochastic whilst the forecast model is deterministic. 


2. In order to avoid confusion with the filters’ internal estimates of (e.g. we use the 
terms assessment or proxy to refer to which externally evaluates the actual performance 
of the filter using the access to the truth. 

3. All numerical experiments presented in this paper were carried out with one and the same 
arbitrarily selected realization of the structural time series i4 and Cfc, so that for any k, 
the signal variance 14 is the same for all plots below. This holds also for any filter’s true 
forecast error variance B^, facilitating comparison of the different plots. 


4.4 Model’s behavior 

Figure [T] displays typical time series segments of Fk and ak, as well as of the true signal variance 
14 and the HBEF’s true forecast error variance Bk. One can see that the variance 14 of the 
signal Xk can vary in time by as much as some two orders of magnitude, so the process Xk 
was significantly non-stationary, as it is the case, say, in meteorology. One can also observe 
that the system-noise standard deviation was correlated with both I 4 and B^ (which is not 
surprising). Correlation between Fk and both I 4 and Bk was also positive but lower. Both I 4 
and Bk tended to be high when both |i4| and Uk are high (low-predictability events), and low 
when both |i4| and ak are low (high-predictability regimes). In general, the model behaved as 
expected. 


4.5 Verifying the conditional Gaussianity of the state given {xf,B) 

From the equation 

x\xf,P,Q^Af{x^,B = P + Q), (52) 

it is obvious that Xk\xl, Bk is Gaussian if and only if so is Xk — xl\Bk. With the true Xk and Bk in 
hand, we were able to verify if indeed —| Rfc ~ AA(0, Bk). Figj^left) presents the respective 
q-q (quantile-quantile) plots. (Note that for a Gaussian density, the q-q plot is a straight line, 
with the slope proportional to the standard deviation of the empirical distribution.) 

One can see that p{xk — x{|Rfc) can indeed be very well approximated by a Gaussian density 
for low, medium, and high values of Bk (the three curves in FigJ^left)). In contrast, the 
unconditional density p{xk — x^) is significantly non-Gaussian with heavy tails, see Figj|(right). 
So, the conditional Gaussianity of the state’s prior distribution is confirmed in our numerical 
experiments. 


4.6 The forecast ensemble members are not drawn from the same distribu¬ 
tion as the truth 

Here, we explore the actual probability distribution of the forecast ensemble members at any 
given time k. We demonstrate that for both the EnKF and the HBEF, the variance of this 
distribution is often substantially biased with respect to the respective true error variance. 

We start by stating that in a single data assimilation run, we cannot find out from which 
(continuous) probability distribution the forecast ensemble members at time k are drawn (be¬ 
cause the ensemble size is small, see assumption]^. But, following section 4.3, for each filter. 
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Time 


Figure 1: Typical time series of: (a) The forecast operator Fk, (b) The model error standard 
deviation cr^, (c) The variance of the truth I 4 , and (d) The true background error variance Bj. 
for the HBEF. The light gray (pink in the web version of the article) vertical stripes indicate 
events when > 1. The dark gray (blue in the web version of the article) vertical stripes 
indicate events when \Fk\ was relatively low. 20 
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Figure 2: The Gaussian q-q plots for the conditional pdf — Xk\Bk) (left) and the uncondi¬ 
tional pdf — Xk) (right). In the left panel, the three curves correspond to the three intervals 
of Bk indicated in the legend. 


we had at our disposal a number of assimilation runs that share the sequence of Then, if 
in each assimilation run, the forecast ensemble members were drawn from the distribution with 
the variance Bk (the “null hypothesis”), we would have E 5^ = Bk, where Sk is the ensemble 
(sample) variance and the expectation is over the population of independent assimilation runs. 
To check if this latter equality actually holds, we estimated ESk as the sample mean (Sk) for 
each k separately using the sample of L assimilation runs. 

The resulting time series of the biases {Sk) — Bk for the EnKF and the HBEF are displayed 
in FigJ^ (the two lower curves) along with their respective 95% bootstrap confidence intervals. 
The true error variances themselves Bk are also shown in Figj^ (the two upper curves) to give 
an impression of the relative magnitude of the biases in {Sk)- 

One can see that that the biases in the ensemble variances were significantly non-zero when 
the true Bk were relatively large. For the EnKF, the deviation of {Sk) from Bk sometimes reached 
50% of Bk- For the HBEF, the biases were less but still significant. In the small forecast error 
regimes, the biases became insignificant. It is also interesting to notice that the large biases 
were mostly negative implying that the filters were under-dispersive (despite the tuned variance 
inflation in the EnKF). Over a longer time window of 10^ time steps, the confidence interval did 
not contain zero (i.e. the bias was significantly non-zero) 78% of time for the EnKF and 62% of 
time for the HBEF. 

Thus, we have to reject the null hypothesis and admit that forecast ensemble members are 
often taken from a distribution which is significantly different from the true one. This has two 
implications. First, the uncertainty in the sample covariances is not only due to the sampling 
noise but also due to an accumulated in time systematic error component. Second, the biases in 
the sample covariances warrant the introduction of the actual predictability ensemble covariance 
matrix 11^ that differs from the true covariance matrix (see section 3.5). 

The above results are worth comparing with those of Bishop and Satterfield Bishop and| 


Satterheld (2013), who found insignificant biases in the ensemble variances, see their Fig.2. 


One dissimilarity between their and our experiments was that an ensemble transform version 
of the EnKF was used in Bishop and Satterfield (2013). We employed the ensemble transform 
technique for both the EnKF and the HBEF and found that this led to some improvements 
but did not remove the biases in Sk (not shown). A plausible reason for the difference in the 
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Figure 3: The two lower curves: biases in the forecast ensemble variances (with the 95% confi¬ 
dence intervals) for the EnKF and the HBEF. The two upper curves: the respective true error 
variances B^. 
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Figure 4: The filters’ analysis RMSEs of the state (with the reference-KF analysis RMSE sub¬ 
tracted) as functions of the ensemble size N (top, left), the observation error standard deviation 
\fR (top, right), the degree of the system’s intermittent instability vr (bottom, left), the vari¬ 
ability in the model error standard deviation s.d. S (bottom, right). 


conclusions is that the system in Bishop and Satterfield (2013) was much better observed than 
ours (they used R which was much less than the mean 14, whereas in our study R was several 
times larger than the mean 14 ). 


4.7 Verifying the primary filters 

Here, we examine the accuracy of the state estimates for the HBEE and the other filters (the 
Var, the EnKE, and the HEnKF). In the below figures, we display their analysis RMSEs with 
the reference-KE analysis RMSEs subtracted. 

Eigure|^top, left) shows the RMSEs as functions of the ensemble size N. One can see that 
the HBEE was by far the best filter. For small N < 3, the Var became more competitive than 
the EnKF and the HEnKF, but still worse than the HBEF. 

Figure [^top, right) shows the RMSEs as functions of '/R. Again, the HBEE performed the 
best. Its relative superiority was especially substantial for the smaller values of v/R- This can 
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Figure 5: RMSEs in B'^ produced by the EnKF and the HBEF. 


be explained by the prevalence of Q (which is more rigorously treated in the HBEF) over P 
(which is only sub-optimally treated in the HBEF) in this regime. 

Figure [^bottom, left) shows the RMSEs as functions of vr = P (|F)c| > !)• One can see that 
the HBEF was uniformly and significantly better than the other filters. Note that all the filters 
gradually deteriorate w.r.t. the reference KF as the system becomes less stable (i.e. as tt grows), 
which is meaningful because errors grow faster in a less stable system. 

Figure [^bottom, right) displays the RMSEs as functions of the degree of intermittency in 
the model error variance quantified by s.d. E. We see that the HBEF was still uniformly and 
substantially better than the EnKF and the HEnKF. For the smallest values of s.d. S, the Var 
became superior to the EnKF and the HEnKF and only slightly worse than the HBEF. The 
fact that the Var worked relatively better for the small s.d. E can be explained by noting that 
in this regime, when the variability in Q was low, the forecast error statistics were less variable 
and so the constant Var’s B was relatively more suitable. 

Thus, in terms of the analysis RMSEs, the HBEF demonstrated its overall superiority over 
the competing EnKF, HEnKF, and Var filters. 


4.8 Verifying the secondary filters 


Recall that the HBEF’s secondary sub-filter produces the posterior estimate B^ = 
of its true forecast error variance B^- The HEnKF yields its B^ as described in item (hi) in 
section 2.7.1 The Var uses the constant B as an estimate of B^, so we associate B with its B^. 
Similarly, we identify the EnKF’s inflated ensemble variance Sk with its 

In this section, we examine the errors — R^, with the filter specific R^ assessed following 


section 4.3 Having the true Rfc for each filter, we computed the RMSE in its R^ estimates using 
averaging over the L independent assimilation runs as A^. = {{B^ — Rfc)^). The resulting 

for the HBEF and the EnKF are depicted in Figj^ where the almost uniform and substantial 
superiority of the HBEF is evident. 

Having square averaged A^ over time, we obtained the time mean RMSEs in R^. In a 
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similar way we computed the biases in The results of an experiment with 10^ time steps are 
collected in Table where it is seen that the HBEF was much more accurate in estimating its 
Bk than the Var, the EnKF, and the HEnKF in estimating their respective true forecast error 
variances. 

Table 1: Accuracy of the filters’ estimates of their own forecast error variance B^ 


Filter 

Error bias 
mean (P^ - P^) 

RMSE 

rms (P^ - Pfc) 

Mean true P 
mean (P^) 

Var 

-0.9 

6.5 

7.6 

EnKF 

-1.4 

6.2 

7.5 

HEnKF 

-1.8 

4.4 

7.2 

HBEF 

-0.5 

3.2 

7.0 


4.9 Role of feedback from observations to forecast error covariances 


The HBEF with the Monte Carlo based analysis (section 3.8.2) provides an optimized way to 


utilize observations in updating P and Q. In the default setup, this capability did not lead to 
any improvement in the performance scores (not shown), but it became significant when the 
filter’s model error variance was misspecified. 

Specifically, we let all the filters (including the KF) “assume” that the model error variance 
Qk equals the true one multiplied by the distortion coefficient qdistort- For several values of 
Qdistort in the range from 1/16 to 16, we computed the RMSEs of the analyses of the state for 
all filters and plotted the results in Figj^ In the HBEF with the Monte Carlo based analysis, 
the size of the Monte Carlo sample was M = 100, see Eqs.(41 )-(42). 


To make the effect more pronounced, the observation error standard deviation was reduced 
to i/R = 1. 

From Fig one can see that the overall performance of the HBEF with the Monte Carlo based 
analysis was better than the performances of the other filters, including, we emphasize, the (now, 
inexact) KF. The observations-to-covariances feedback present in the Monte Carlo based HBEF 
(and absent in the other filters) appeared especially useful for qdistort < 1- The improvement 
was bigger for qdistort < 1 than for qdistort > 1 because an underestimation of the forecast 
error covariances is potentially more problematic for any filter. Indeed, the overconfidence 
in the forecast leads to an underuse of observations and in extreme cases can even lead to 
filter divergence. This is why the settings with qdistort < 1 left more room for improvement, 
particularly due to the feedback from observations to the covariances. 

Another interesting conclusion can be drawn from comparing the Monte Carlo based version 
of the HBEF with the optimally tuned parameter 6 (asterisks in FigJ^ and the same version of 
the HBEF but with 6 = oo (crosses). Recall that 0 controls the difference between the variance 
H of the distribution of the predictability ensemble members and the true variance P. In the 
setting with 9 = oo, the HBEF “assumes” that H = P. Figure clearly shows that it was 
indeed beneficial to get away from the traditional assumption H = P. This again justifies our 


suggestion (see section 3.5) to allow the ensemble distribution to be different from the true one. 


5 Discussion 

5.1 Comparison with other approaches 


The HBEF has two immediate predecessors, the HEnKF Myrseth and Omre (2010) and the 


EnKF-N Bocquet (2011); Bocquet et al. (2015). The HBEF differs from the HEnKF in the 
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Figure 6: Analysis RMSEs of the state for the filters which used the wrong Q. The latter was 
specified to be the true Q multiplied by the distortion coefficient qdistort- 


following aspects. First, in the HBEF we treat Q and P separately instead of using the total 
background error covariance matrix B. Second, the HBEF’s forecast step is based on the per¬ 
sistence forecasts for the posterior point estimates of Q and P instead of that for the analysis 
error covariance matrix. These two improvements have led to the substantially better perfor¬ 
mance of the HBEE as compared to the HEnKF. Another difference from the HEnKF is that 
the Monte Carlo based HBEF permits observations to influence Q and P. Experimentally, this 
latter featnre appeared to be beneficial only when Q was significantly misspecified, though. 

As compared to the EnKF-N, which integrates B out of the prior distribution, the HBEF 
explicitly updates the covariance matrices. This introduces memory in the covariances, which, 
as we have seen in the numerical experiments, can be beneficial. 

In contrast to both the HEnKF and the EnKF-N, the HBEF in its present formulation does 
not treat the uncertainty in the prior mean state vector (this may be worth exploring in the 
future). But the HBEF systematically treats the uncertainty in Q, which was assumed to be 
known in Myrseth and Omre (2010) and equal to zero in Bocquet (2011); 

(2012); Bocquet et al.| (2015). 


Bocquet and Sakov 


5.2 Restrictions of the proposed technique 

First, the HBEF heavily relies on the conditional Gaussian prior distribution of the state. It 
is this assumption that greatly simplifies the analysis algorithm, but in a nonlinear context, it 
becomes an approximation, whose validity is to be verified. 

Second, the HBEF makes use of the inverse Wishart prior distribution for the covariance 
matrices. There is no justification for this hypothesis other than partial analytical tractability 
of the resulting analysis equations, so other choices can be explored. 
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5.3 Practical applications 


In order to apply the proposed technique to real-world high-dimensional problems, simplihcations 
are needed because the n x n covariance matrices will be too large to be stored and handled. 
The computational burden can be reduced in different ways. Here is one of them. First, let the 
covariances to be defined on a coarse grid. Second, localize (taper) the covariances and store 
only non-zero covariance matrix entries. Third, use the simplest version of the HBEF. 

Another possibility is to fit a parametric covariance model to current covariances and impose 
persistence for the parameters of the model. In this case, the simplest version of the HBEF would 
become close to practical ensemble variational schemes, but with climatological covariances 
replaced by evolving recent-past-data based covariances. 

In high dimensions, the persistence forecast for the covariances seems to be worth improving. 
Specifically, one may wish to somehow spatially smooth and Q,%_i in Eq.(47)—because 

it is meaningful that smaller scales in and have less chance to survive until the 

next assimilation cycle than larger scales. Another way to improve the empirical forecast of 
the covariance matrices is to introduce a kind of “regression to the mean” making use of the 
time mean covariances. This would imply that the HBEF would cover not only EnKF but also 
ensemble variational hybrids as a special case. 

The ultimate goal with the HBEF will be to obtain effective covariance regularization as 
a by-product of the hierarchical analysis scheme without using any ad-hoc device (as it was 


proposed for the EnKF-N in Bocquet (2011) and partially tested in Bocquet et al. (2015)). 


6 Conclusions 


The progress made in this study can be summarized as follows. 


• We have acknowledged that in most applications, the EnKF works with: (i) the explic¬ 
itly unknown and variable model error covariance matrix Q^, (ii) the partially known 
(through ensemble covariances) background error covariance matrix. Under these explicit 
restrictions, we have proposed a new Hierarchical Bayes Ensemble Filter (HBEF) that op¬ 
timizes the use of observational and ensemble data by treating and the predictability 
covariance matrix Pfc as random matrices to be estimated in the analysis along with the 
state. The ensemble members are treated in the HBEF as generalized observations on the 
covariance matrices. 


• With the new HBEF filter, in the course of filtering, the prior and posterior distributions 
of the state remain conditionally (given Pk,Qk) Gaussian provided that: (i) it is so at 
the start of the filtering, (ii) observation errors are Gaussian, (hi) the dynamics and the 
observation operators are linear, and (iv) model errors are conditionally Gaussian given 
Qfc. Unconditionally, the prior and posterior distributions of the state are non-Gaussian. 


The HBEF is tested with a new one-variable doubly stochastic model of truth. The 
model has the advantage of providing the means to assess the instantaneous variance of 
the truth and the true filter’s error variances. The HBEF is found superior the EnKF 
and the HEnKF Myrseth and Omre (2010) under most regimes of the system, most data 
assimilation setups, and in terms of performance of both primary and secondary filters. 


• The availability of the true error variances has permitted us to experimentally prove that 
the forecast ensemble variances in both the EnKF and the HBEF are often significantly 
biased with respect to the true variances. 


• It is shown that the HBEF’s feedback from observations to the covariances can be benefi¬ 
cial. 
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The simplest version of the HBEF is designed to be affordable for practical high¬ 
dimensional applications on existing computers. 
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A Inverse Wishart distribution 


In Bayesian statistics, the inverse Wishart distribution (e.g. Gupta and Nagar (1999); Anderson| 
(2003); Gelman et al. (2004)) is the standard choice for the prior distribution of a random co¬ 


variance matrix, because inverse Wishart is the so-called conjugate distribution for the Gaussian 


likelihood, e.g. Anderson (2003); Gelman et al. (2004). The inverse Wishart pdf is defined for 


symmetric matrices and is non-zero for positive definite ones: 


p{Z) oc 


1 


t/+n + l 

Z 2 


,-itr{Z-lS) 


(53) 


where n > n + 1 is the so-called number of degrees of freedom (which controls the spread of the 
distribution: the greater u, the less the spread) and S is the positive definite scaling matrix. 
Using the mean value Z = EZ = El/(i/ — n — 1) instead of the scaling matrix allows us to 


reparametrize Eq.(53) as 


p{z)=p{z\e,z) 
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oc 
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Z 2 




(54) 


where we have introduced a new scale parameter 6 = v — n — \ >0, which we call the sharpness 


parameter (the higher 9, the narrower the density). We symbolically write Eq.(54) as 

Z ~XW(0,Z). 


(55) 


We prefer our parametrization [9, Z) to the common one {v, El) because Z has the clear meaning 
of the (important) mean Z matrix. Summarizing, the inverse Wishart pdf has two parameters: 
the sharpness parameter 9 (a scalar) and the mean Z (a positive definite matrix). 


30 






























B Assimilation of conditionally Gaussian generalized observa¬ 
tions in an update of their covariance matrix 

Here, we outline, following e.g. |Gelman et alT (2004), the procedure of assimilation of independent 
draws from the distribution AA(m, Z), where m is the known vector and Z the unknown random 
symmetric positive definite matrix, whose prior distribution is inverse Wishart with the density 
specified by Eq.(54). 

Let us take a draw x-(i)|Z ~ AA(m, Z), which we interpret as a member of an ensemble. 
Then, obviously. 


p(x^(i)|Z) oc 


1 


, 1 

Z 2 


— i(x®(i) —m)^Z ^(x°(i)—m) 


e 2 


(56) 


We stress that Eq.(56) is nothing other than the likelihood of Z given the ensemble member 
x®(i). Further, having the ensemble X® = (x®(l),..., x®(V)) of N independent members all 
taken from AA(m, Z), we can write down the respective ensemble likelihood as the product of the 
partial likelihoods: 


^(X^^IZ) oc 
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Z T 




e 2 


, , N 

Z 2 


(57) 


where 


1 ^ 

S = — ^(x‘=(z) - m) (x®(i) - m) 


T 


(58) 


i=l 


is the sample covariance matrix. But having the likelihood p(X®|Z) means that X'^ (and its 
members x®(z)) can be regarded and treated as (generalized) observations on Z. In particular, 
the ensemble can be assimilated in the standard way using the Bayes theorem. Indeed, having 
the prior pdf of Z, Eq.(54), we obtain the posterior 
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where 


= e + N 


and 


Z“ = 


9Z + NS 
9 + N ' 


(60) 


In the right-hand side of Eq.(59), we recognize again the inverse Wishart pdf (hence its conju- 
gacy), see Eq.(54), with 0“ being the posterior sharpness parameter and Z“ being the posterior 
mean of Z. Consequently, Z“ is the mean-square optimal point estimate of Z given both the 
prior and ensemble information. So, we have optimally assimilated the (conditionally Gaus¬ 
sian) ensemble data to update the (inverse Wishart) prior distribution of the random covariance 
matrix. 


C Integral w.r.t. a matrix 

For a general n x n-matrix C, the integral f /(C) dC of a scalar function /(C) over the space of 
all matrices with real entries is defined as follows. First, we vectorize C, i.e. build the vector C of 
length that comprises all entries of C. Then, we simply identify f /(C) dC with f /(C) dC, 
that is, with the traditional multiple (Lebesgue or Riemann) integral over the Euclidean space 
of dimensionality n^. 

The integral w.r.t. a symmetric positive definite matrix is defined in a similar way. The 
difference from the general matrix case is that the vectorization here involves collecting in C 
only algebraically independent matrix entries (e.g. the upper triangle of C) and the multiple 
integral is over the set (the convex cone) of those C that correspond to positive definite matrices. 
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List of main symbols 


()a 

0 

()/e^()ae 
()me^ ()p 


e 


(•) 

A 

B 

F 

H 

i 

K 

k 

L 

^(B|y) 

n 

N 

P 

P,Q,R 

S 

Vk 

X 

x“ 

x-^ 

x-(i),x®(i) 

X 

y 

Y 

xw 

AA(m, B) 
e 

V 

vr 

a 

E 

rms, RMSE 
s.d., Var 
tr 
oc 


1 : k 


posterior (analysis) pdf (i.e. conditioned on past and current data) and its 
parameters 

prior (forecast) pdf (i.e. conditioned on past data) and its parameters 
sub-posterior pdf (i.e. conditioned on past data and current ensemble data) 
and its parameters 

forecast ensemble / analysis ensemble 
model error ensemble / predictability ensemble 
time mean value 

average over L independent realizations of the truth / assimilation runs 

posterior (analysis error) covariance matrix 

prior (forecast error) covariance matrices 

forecast operator 

observation operator 

ensemble member index 

Kalman gain matrix 

time instance index 

number of independent assimilation runs 
observation likelihood of the matrix B 
posterior mean x given P, Q 
dimensionality of state space 
ensemble size 

probability density function (pdf) 

predictability error / model error / observation error covariance matrix 

sample (ensemble) covariance matrix 

Varxfc 

state vector, “truth” 

posterior mean vector and its approximations (deterministic analysis) 
prior mean vector (identihed in this study with the deterministic forecast) 
ensemble member 
ensemble 

observation vector 

observation and ensemble data combined 

inverse Wishart distribution (parametrized according to Q 

Ganssian distribntion with the mean m and covariance matrix B 

model error (system noise) vector 

observation error vector 

sharpness parameters for the inverse Wishart pdfs 

portion of time the process is greater than 1 in modulus 

(time-specific) model error standard deviation 

expectation operator 

root-mean-square value / error 

standard deviation / variance 

matrix trace 

proportionality 

has (corresponds to) the probability distribution 
concatenation from the time instance 1 to the time instance k 
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